Overview

Dataset statistics

Number of variables18
Number of observations2651861
Missing cells2152547
Missing cells (%)4.5%
Duplicate rows2452
Duplicate rows (%)0.1%
Total size in memory364.2 MiB
Average record size in memory144.0 B

Variable types

Numeric10
Categorical8

Warnings

Dataset has 2452 (0.1%) duplicate rows Duplicates
Start_Time has a high cardinality: 2582957 distinct values High cardinality
End_Time has a high cardinality: 2578441 distinct values High cardinality
Weather_Condition has a high cardinality: 121 distinct values High cardinality
Temperature(F) has 45294 (1.7%) missing values Missing
Humidity(%) has 48306 (1.8%) missing values Missing
Pressure(in) has 38857 (1.5%) missing values Missing
Visibility(mi) has 53002 (2.0%) missing values Missing
Wind_Direction has 40241 (1.5%) missing values Missing
Wind_Speed(mph) has 344548 (13.0%) missing values Missing
Precipitation(in) has 1529311 (57.7%) missing values Missing
Weather_Condition has 52931 (2.0%) missing values Missing
Distance(mi) is highly skewed (γ1 = 37.62493565) Skewed
Precipitation(in) is highly skewed (γ1 = 49.31946349) Skewed
Start_Time is uniformly distributed Uniform
End_Time is uniformly distributed Uniform
Distance(mi) has 2215378 (83.5%) zeros Zeros
Wind_Speed(mph) has 164999 (6.2%) zeros Zeros
Precipitation(in) has 937421 (35.3%) zeros Zeros

Reproduction

Analysis started2021-05-02 17:00:37.590588
Analysis finished2021-05-02 17:08:42.934028
Duration8 minutes and 5.34 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

TMC
Real number (ℝ≥0)

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean208.2827052
Minimum200
Maximum406
Zeros0
Zeros (%)0.0%
Memory size20.2 MiB
2021-05-02T19:08:43.222079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum200
5-th percentile201
Q1201
median201
Q3201
95-th percentile241
Maximum406
Range206
Interquartile range (IQR)0

Descriptive statistics

Standard deviation21.24599285
Coefficient of variation (CV)0.1020055546
Kurtosis39.42717498
Mean208.2827052
Median Absolute Deviation (MAD)0
Skewness5.26517742
Sum552336783
Variance451.3922124
MonotocityNot monotonic
2021-05-02T19:08:43.403967image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
2012215577
83.5%
241272196
 
10.3%
24550124
 
1.9%
22922863
 
0.9%
20318010
 
0.7%
22213323
 
0.5%
24412982
 
0.5%
40612652
 
0.5%
2468751
 
0.3%
3437890
 
0.3%
Other values (11)17493
 
0.7%
ValueCountFrequency (%)
20066
 
< 0.1%
2012215577
83.5%
2026417
 
0.2%
20318010
 
0.7%
2061365
 
0.1%
ValueCountFrequency (%)
40612652
0.5%
3516
 
< 0.1%
3437890
0.3%
341657
 
< 0.1%
3391003
 
< 0.1%

Severity
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.2 MiB
2
1754544 
3
886933 
4
 
9269
1
 
1115

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2651861
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row2
3rd row2
4th row3
5th row2
ValueCountFrequency (%)
21754544
66.2%
3886933
33.4%
49269
 
0.3%
11115
 
< 0.1%
2021-05-02T19:08:43.856688image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-02T19:08:43.989606image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
21754544
66.2%
3886933
33.4%
49269
 
0.3%
11115
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
21754544
66.2%
3886933
33.4%
49269
 
0.3%
11115
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2651861
100.0%

Most frequent character per category

ValueCountFrequency (%)
21754544
66.2%
3886933
33.4%
49269
 
0.3%
11115
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common2651861
100.0%

Most frequent character per script

ValueCountFrequency (%)
21754544
66.2%
3886933
33.4%
49269
 
0.3%
11115
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2651861
100.0%

Most frequent character per block

ValueCountFrequency (%)
21754544
66.2%
3886933
33.4%
49269
 
0.3%
11115
 
< 0.1%

Start_Time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct2582957
Distinct (%)97.4%
Missing0
Missing (%)0.0%
Memory size20.2 MiB
2018-11-25 01:22:49
 
53
2018-11-12 00:37:27
 
40
2018-12-18 07:11:45
 
37
2016-04-10 08:59:26
 
35
2017-09-09 09:03:14
 
23
Other values (2582952)
2651673 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters50385359
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2521389 ?
Unique (%)95.1%

Sample

1st row2016-02-08 05:46:00
2nd row2016-02-08 06:07:59
3rd row2016-02-08 06:49:27
4th row2016-02-08 07:23:34
5th row2016-02-08 07:39:07
ValueCountFrequency (%)
2018-11-25 01:22:4953
 
< 0.1%
2018-11-12 00:37:2740
 
< 0.1%
2018-12-18 07:11:4537
 
< 0.1%
2016-04-10 08:59:2635
 
< 0.1%
2017-09-09 09:03:1423
 
< 0.1%
2017-09-06 15:52:3622
 
< 0.1%
2019-12-17 06:32:1122
 
< 0.1%
2016-06-12 10:07:3722
 
< 0.1%
2016-05-21 08:30:4221
 
< 0.1%
2016-05-22 07:37:2821
 
< 0.1%
Other values (2582947)2651565
> 99.9%
2021-05-02T19:09:02.675499image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2018-11-063400
 
0.1%
2018-11-093340
 
0.1%
2018-11-053156
 
0.1%
2017-09-153154
 
0.1%
2018-11-023089
 
0.1%
2018-10-263074
 
0.1%
2018-10-193066
 
0.1%
2018-10-183036
 
0.1%
2019-11-153035
 
0.1%
2017-09-293016
 
0.1%
Other values (87589)5272356
99.4%

Most occurring characters

ValueCountFrequency (%)
08958109
17.8%
17689838
15.3%
26412535
12.7%
-5303722
10.5%
:5303722
10.5%
2651861
 
5.3%
82185072
 
4.3%
32145848
 
4.3%
52101887
 
4.2%
42048449
 
4.1%
Other values (3)5584316
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number37126054
73.7%
Dash Punctuation5303722
 
10.5%
Other Punctuation5303722
 
10.5%
Space Separator2651861
 
5.3%

Most frequent character per category

ValueCountFrequency (%)
08958109
24.1%
17689838
20.7%
26412535
17.3%
82185072
 
5.9%
32145848
 
5.8%
52101887
 
5.7%
42048449
 
5.5%
71998891
 
5.4%
91966592
 
5.3%
61618833
 
4.4%
ValueCountFrequency (%)
-5303722
100.0%
ValueCountFrequency (%)
2651861
100.0%
ValueCountFrequency (%)
:5303722
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common50385359
100.0%

Most frequent character per script

ValueCountFrequency (%)
08958109
17.8%
17689838
15.3%
26412535
12.7%
-5303722
10.5%
:5303722
10.5%
2651861
 
5.3%
82185072
 
4.3%
32145848
 
4.3%
52101887
 
4.2%
42048449
 
4.1%
Other values (3)5584316
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII50385359
100.0%

Most frequent character per block

ValueCountFrequency (%)
08958109
17.8%
17689838
15.3%
26412535
12.7%
-5303722
10.5%
:5303722
10.5%
2651861
 
5.3%
82185072
 
4.3%
32145848
 
4.3%
52101887
 
4.2%
42048449
 
4.1%
Other values (3)5584316
11.1%

End_Time
Categorical

HIGH CARDINALITY
UNIFORM

Distinct2578441
Distinct (%)97.2%
Missing0
Missing (%)0.0%
Memory size20.2 MiB
2018-11-25 02:51:02
 
46
2018-12-18 08:11:10
 
37
2016-10-14 19:50:00
 
24
2018-09-16 13:51:54
 
22
2017-09-07 05:22:04
 
21
Other values (2578436)
2651711 

Length

Max length19
Median length19
Mean length19
Min length19

Characters and Unicode

Total characters50385359
Distinct characters13
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2511388 ?
Unique (%)94.7%

Sample

1st row2016-02-08 11:00:00
2nd row2016-02-08 06:37:59
3rd row2016-02-08 07:19:27
4th row2016-02-08 07:53:34
5th row2016-02-08 08:09:07
ValueCountFrequency (%)
2018-11-25 02:51:0246
 
< 0.1%
2018-12-18 08:11:1037
 
< 0.1%
2016-10-14 19:50:0024
 
< 0.1%
2018-09-16 13:51:5422
 
< 0.1%
2017-09-07 05:22:0421
 
< 0.1%
2018-03-28 09:35:0521
 
< 0.1%
2020-04-10 20:21:5221
 
< 0.1%
2016-10-14 16:30:0018
 
< 0.1%
2016-10-14 18:30:0016
 
< 0.1%
2019-04-05 07:52:5816
 
< 0.1%
Other values (2578431)2651619
> 99.9%
2021-05-02T19:09:19.760391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2018-11-063401
 
0.1%
2018-11-093350
 
0.1%
2018-11-053145
 
0.1%
2017-09-153132
 
0.1%
2018-11-023082
 
0.1%
2018-10-263079
 
0.1%
2018-10-193059
 
0.1%
2018-10-183042
 
0.1%
2019-11-153039
 
0.1%
2019-11-223028
 
0.1%
Other values (87546)5272365
99.4%

Most occurring characters

ValueCountFrequency (%)
09100940
18.1%
17751081
15.4%
26538682
13.0%
-5303722
10.5%
:5303722
10.5%
2651861
 
5.3%
82206844
 
4.4%
32113947
 
4.2%
92079685
 
4.1%
51994678
 
4.0%
Other values (3)5340197
10.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number37126054
73.7%
Dash Punctuation5303722
 
10.5%
Other Punctuation5303722
 
10.5%
Space Separator2651861
 
5.3%

Most frequent character per category

ValueCountFrequency (%)
09100940
24.5%
17751081
20.9%
26538682
17.6%
82206844
 
5.9%
32113947
 
5.7%
92079685
 
5.6%
51994678
 
5.4%
41949696
 
5.3%
71887334
 
5.1%
61503167
 
4.0%
ValueCountFrequency (%)
-5303722
100.0%
ValueCountFrequency (%)
2651861
100.0%
ValueCountFrequency (%)
:5303722
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common50385359
100.0%

Most frequent character per script

ValueCountFrequency (%)
09100940
18.1%
17751081
15.4%
26538682
13.0%
-5303722
10.5%
:5303722
10.5%
2651861
 
5.3%
82206844
 
4.4%
32113947
 
4.2%
92079685
 
4.1%
51994678
 
4.0%
Other values (3)5340197
10.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII50385359
100.0%

Most frequent character per block

ValueCountFrequency (%)
09100940
18.1%
17751081
15.4%
26538682
13.0%
-5303722
10.5%
:5303722
10.5%
2651861
 
5.3%
82206844
 
4.4%
32113947
 
4.2%
92079685
 
4.1%
51994678
 
4.0%
Other values (3)5340197
10.6%

Start_Lat
Real number (ℝ≥0)

Distinct827046
Distinct (%)31.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36.10331858
Minimum24.555269
Maximum49.002201
Zeros0
Zeros (%)0.0%
Memory size20.2 MiB
2021-05-02T19:09:20.476543image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum24.555269
5-th percentile28.304241
Q133.148983
median35.391338
Q339.98476
95-th percentile43.181423
Maximum49.002201
Range24.446932
Interquartile range (IQR)6.835777

Descriptive statistics

Standard deviation4.839869987
Coefficient of variation (CV)0.1340560972
Kurtosis-0.5485835472
Mean36.10331858
Median Absolute Deviation (MAD)3.571625
Skewness0.08573729209
Sum95740982.52
Variance23.42434149
MonotocityNot monotonic
2021-05-02T19:09:20.719680image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33.941364539
 
< 0.1%
42.476501534
 
< 0.1%
33.744976530
 
< 0.1%
37.808498505
 
< 0.1%
34.858925493
 
< 0.1%
33.876289434
 
< 0.1%
42.368423433
 
< 0.1%
33.781532432
 
< 0.1%
25.789072429
 
< 0.1%
40.850067415
 
< 0.1%
Other values (827036)2647117
99.8%
ValueCountFrequency (%)
24.5552691
< 0.1%
24.55741
< 0.1%
24.559871
< 0.1%
24.5602461
< 0.1%
24.5606881
< 0.1%
ValueCountFrequency (%)
49.0022011
 
< 0.1%
49.0007591
 
< 0.1%
48.9999011
 
< 0.1%
48.9995691
 
< 0.1%
48.9982414
< 0.1%

Start_Lng
Real number (ℝ)

Distinct789293
Distinct (%)29.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-93.3411788
Minimum-124.623833
Maximum-67.839745
Zeros0
Zeros (%)0.0%
Memory size20.2 MiB
2021-05-02T19:09:21.422783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-124.623833
5-th percentile-122.079926
Q1-105.962898
median-87.462791
Q3-80.821632
95-th percentile-73.884247
Maximum-67.839745
Range56.784088
Interquartile range (IQR)25.141266

Descriptive statistics

Standard deviation16.23687298
Coefficient of variation (CV)-0.173951874
Kurtosis-1.007185342
Mean-93.3411788
Median Absolute Deviation (MAD)9.255227
Skewness-0.6380036073
Sum-247527831.7
Variance263.636044
MonotocityNot monotonic
2021-05-02T19:09:21.657150image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.096634535
 
< 0.1%
-83.111794534
 
< 0.1%
-84.390343529
 
< 0.1%
-122.366852511
 
< 0.1%
-82.259857494
 
< 0.1%
-118.368263472
 
< 0.1%
-84.390869455
 
< 0.1%
-83.058128434
 
< 0.1%
-80.204353430
 
< 0.1%
-73.944817419
 
< 0.1%
Other values (789283)2647048
99.8%
ValueCountFrequency (%)
-124.6238331
< 0.1%
-124.5344391
< 0.1%
-124.4931491
< 0.1%
-124.4844211
< 0.1%
-124.4791791
< 0.1%
ValueCountFrequency (%)
-67.8397451
< 0.1%
-67.8418581
< 0.1%
-68.0601651
< 0.1%
-68.140031
< 0.1%
-68.3808521
< 0.1%

Distance(mi)
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct3875
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1986762127
Minimum0
Maximum441.75
Zeros2215378
Zeros (%)83.5%
Memory size20.2 MiB
2021-05-02T19:09:21.907142image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.6299999952
Maximum441.75
Range441.75
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.584473869
Coefficient of variation (CV)7.9751564
Kurtosis4880.659121
Mean0.1986762127
Median Absolute Deviation (MAD)0
Skewness37.62493565
Sum526861.7
Variance2.510557442
MonotocityNot monotonic
2021-05-02T19:09:22.140560image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02215378
83.5%
0.01246986
 
9.3%
0.00999999977612808
 
0.5%
0.019999999555885
 
0.2%
0.6800000072808
 
< 0.1%
0.4499999881793
 
< 0.1%
0.4300000072780
 
< 0.1%
0.4900000095776
 
< 0.1%
0.3899999857774
 
< 0.1%
0.3100000024770
 
< 0.1%
Other values (3865)166103
 
6.3%
ValueCountFrequency (%)
02215378
83.5%
0.00999999977612808
 
0.5%
0.01246986
 
9.3%
0.019999999555885
 
0.2%
0.029
 
< 0.1%
ValueCountFrequency (%)
441.751
< 0.1%
333.63000491
< 0.1%
254.39999391
< 0.1%
251.22000121
< 0.1%
227.21000671
< 0.1%

Side
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.2 MiB
R
2115329 
L
536531 
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2651861
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowR
2nd rowL
3rd rowR
4th rowR
5th rowR
ValueCountFrequency (%)
R2115329
79.8%
L536531
 
20.2%
1
 
< 0.1%
2021-05-02T19:09:22.556627image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-02T19:09:22.681621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
r2115329
79.8%
l536531
 
20.2%

Most occurring characters

ValueCountFrequency (%)
R2115329
79.8%
L536531
 
20.2%
1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2651860
> 99.9%
Space Separator1
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
R2115329
79.8%
L536531
 
20.2%
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2651860
> 99.9%
Common1
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
R2115329
79.8%
L536531
 
20.2%
ValueCountFrequency (%)
1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2651861
100.0%

Most frequent character per block

ValueCountFrequency (%)
R2115329
79.8%
L536531
 
20.2%
1
 
< 0.1%

State
Categorical

Distinct49
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.2 MiB
CA
483157 
TX
299660 
FL
214170 
SC
184379 
NC
141834 
Other values (44)
1328661 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters5303722
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOH
2nd rowOH
3rd rowOH
4th rowOH
5th rowOH
ValueCountFrequency (%)
CA483157
18.2%
TX299660
 
11.3%
FL214170
 
8.1%
SC184379
 
7.0%
NC141834
 
5.3%
NY126453
 
4.8%
PA92719
 
3.5%
MI77183
 
2.9%
VA76361
 
2.9%
GA75310
 
2.8%
Other values (39)880635
33.2%
2021-05-02T19:09:23.119107image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca483157
18.2%
tx299660
 
11.3%
fl214170
 
8.1%
sc184379
 
7.0%
nc141834
 
5.3%
ny126453
 
4.8%
pa92719
 
3.5%
mi77183
 
2.9%
va76361
 
2.9%
ga75310
 
2.8%
Other values (39)880635
33.2%

Most occurring characters

ValueCountFrequency (%)
A997313
18.8%
C862776
16.3%
N490948
9.3%
T405640
7.6%
L391264
 
7.4%
X299660
 
5.6%
M234004
 
4.4%
F214170
 
4.0%
I203952
 
3.8%
S194102
 
3.7%
Other values (14)1009893
19.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5303722
100.0%

Most frequent character per category

ValueCountFrequency (%)
A997313
18.8%
C862776
16.3%
N490948
9.3%
T405640
7.6%
L391264
 
7.4%
X299660
 
5.6%
M234004
 
4.4%
F214170
 
4.0%
I203952
 
3.8%
S194102
 
3.7%
Other values (14)1009893
19.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5303722
100.0%

Most frequent character per script

ValueCountFrequency (%)
A997313
18.8%
C862776
16.3%
N490948
9.3%
T405640
7.6%
L391264
 
7.4%
X299660
 
5.6%
M234004
 
4.4%
F214170
 
4.0%
I203952
 
3.8%
S194102
 
3.7%
Other values (14)1009893
19.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII5303722
100.0%

Most frequent character per block

ValueCountFrequency (%)
A997313
18.8%
C862776
16.3%
N490948
9.3%
T405640
7.6%
L391264
 
7.4%
X299660
 
5.6%
M234004
 
4.4%
F214170
 
4.0%
I203952
 
3.8%
S194102
 
3.7%
Other values (14)1009893
19.0%

Temperature(F)
Real number (ℝ)

MISSING

Distinct826
Distinct (%)< 0.1%
Missing45294
Missing (%)1.7%
Infinite0
Infinite (%)0.0%
Mean62.55807182
Minimum-89
Maximum203
Zeros493
Zeros (%)< 0.1%
Memory size20.2 MiB
2021-05-02T19:09:23.322228image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-89
5-th percentile29
Q150
median64.9
Q377
95-th percentile89.1
Maximum203
Range292
Interquartile range (IQR)27

Descriptive statistics

Standard deviation18.62876513
Coefficient of variation (CV)0.2977835567
Kurtosis-0.01953030608
Mean62.55807182
Median Absolute Deviation (MAD)12.9
Skewness-0.5091188724
Sum163061805.6
Variance347.0308902
MonotocityNot monotonic
2021-05-02T19:09:23.556596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7760762
 
2.3%
6859117
 
2.2%
7354275
 
2.0%
5952197
 
2.0%
7550638
 
1.9%
7250297
 
1.9%
7049209
 
1.9%
6346801
 
1.8%
6446359
 
1.7%
7945831
 
1.7%
Other values (816)2091081
78.9%
(Missing)45294
 
1.7%
ValueCountFrequency (%)
-897
< 0.1%
-77.810
< 0.1%
-331
 
< 0.1%
-32.81
 
< 0.1%
-29.91
 
< 0.1%
ValueCountFrequency (%)
2031
< 0.1%
1891
< 0.1%
1742
< 0.1%
1671
< 0.1%
161.61
< 0.1%

Humidity(%)
Real number (ℝ≥0)

MISSING

Distinct100
Distinct (%)< 0.1%
Missing48306
Missing (%)1.8%
Infinite0
Infinite (%)0.0%
Mean66.32201279
Minimum1
Maximum100
Zeros0
Zeros (%)0.0%
Memory size20.2 MiB
2021-05-02T19:09:23.823289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile26
Q150
median69
Q386
95-th percentile97
Maximum100
Range99
Interquartile range (IQR)36

Descriptive statistics

Standard deviation22.37503884
Coefficient of variation (CV)0.3373697194
Kurtosis-0.679863749
Mean66.32201279
Median Absolute Deviation (MAD)18
Skewness-0.4242431196
Sum172673008
Variance500.6423633
MonotocityNot monotonic
2021-05-02T19:09:24.086964image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100114775
 
4.3%
93104841
 
4.0%
9065221
 
2.5%
8763477
 
2.4%
9652004
 
2.0%
9449241
 
1.9%
8947940
 
1.8%
8447352
 
1.8%
8144877
 
1.7%
8243964
 
1.7%
Other values (90)1969863
74.3%
(Missing)48306
 
1.8%
ValueCountFrequency (%)
12
 
< 0.1%
210
 
< 0.1%
348
 
< 0.1%
4515
< 0.1%
51064
< 0.1%
ValueCountFrequency (%)
100114775
4.3%
993939
 
0.1%
982219
 
0.1%
9735367
 
1.3%
9652004
2.0%

Pressure(in)
Real number (ℝ≥0)

MISSING

Distinct988
Distinct (%)< 0.1%
Missing38857
Missing (%)1.5%
Infinite0
Infinite (%)0.0%
Mean29.7739001
Minimum0
Maximum58.04
Zeros1
Zeros (%)< 0.1%
Memory size20.2 MiB
2021-05-02T19:09:24.352765image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile28.85
Q129.73
median29.95
Q330.09
95-th percentile30.33
Maximum58.04
Range58.04
Interquartile range (IQR)0.36

Descriptive statistics

Standard deviation0.7501909159
Coefficient of variation (CV)0.02519625959
Kurtosis48.36963466
Mean29.7739001
Median Absolute Deviation (MAD)0.17
Skewness-5.191468097
Sum77799320.06
Variance0.5627864103
MonotocityNot monotonic
2021-05-02T19:09:24.611455image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.0153925
 
2.0%
29.9652678
 
2.0%
29.9952632
 
2.0%
30.0451724
 
2.0%
29.9450188
 
1.9%
30.0649674
 
1.9%
29.9146084
 
1.7%
30.0345915
 
1.7%
3045057
 
1.7%
30.0945045
 
1.7%
Other values (978)2120082
79.9%
ValueCountFrequency (%)
01
 
< 0.1%
0.121
 
< 0.1%
0.292
< 0.1%
0.34
< 0.1%
0.391
 
< 0.1%
ValueCountFrequency (%)
58.041
< 0.1%
33.042
< 0.1%
31.151
< 0.1%
31.141
< 0.1%
31.122
< 0.1%

Visibility(mi)
Real number (ℝ≥0)

MISSING

Distinct77
Distinct (%)< 0.1%
Missing53002
Missing (%)2.0%
Infinite0
Infinite (%)0.0%
Mean9.100616274
Minimum0
Maximum140
Zeros1133
Zeros (%)< 0.1%
Memory size20.2 MiB
2021-05-02T19:09:24.860245image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q110
median10
Q310
95-th percentile10
Maximum140
Range140
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.773215094
Coefficient of variation (CV)0.3047282745
Kurtosis76.02581646
Mean9.100616274
Median Absolute Deviation (MAD)0
Skewness2.924693413
Sum23651218.51
Variance7.690721956
MonotocityNot monotonic
2021-05-02T19:09:25.094614image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102064841
77.9%
782235
 
3.1%
971282
 
2.7%
856596
 
2.1%
552816
 
2.0%
645955
 
1.7%
442294
 
1.6%
341422
 
1.6%
234538
 
1.3%
123969
 
0.9%
Other values (67)82911
 
3.1%
(Missing)53002
 
2.0%
ValueCountFrequency (%)
01133
< 0.1%
0.06114
 
< 0.1%
0.1996
< 0.1%
0.12362
 
< 0.1%
0.195
 
< 0.1%
ValueCountFrequency (%)
1401
 
< 0.1%
1112
 
< 0.1%
1051
 
< 0.1%
1011
 
< 0.1%
1007
< 0.1%

Wind_Direction
Categorical

MISSING

Distinct24
Distinct (%)< 0.1%
Missing40241
Missing (%)1.5%
Memory size20.2 MiB
Calm
285016 
CALM
 
164998
SSW
 
136302
South
 
134871
SW
 
129211
Other values (19)
1761222 

Length

Max length8
Median length3
Mean length3.253866183
Min length1

Characters and Unicode

Total characters8497862
Distinct characters22
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCalm
2nd rowCalm
3rd rowSW
4th rowSW
5th rowSW
ValueCountFrequency (%)
Calm285016
 
10.7%
CALM164998
 
6.2%
SSW136302
 
5.1%
South134871
 
5.1%
SW129211
 
4.9%
SSE125424
 
4.7%
WNW123072
 
4.6%
West121874
 
4.6%
WSW119376
 
4.5%
NW118928
 
4.5%
Other values (14)1152548
43.5%
2021-05-02T19:09:25.618668image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
calm450014
17.2%
ssw136302
 
5.2%
south134871
 
5.2%
sw129211
 
4.9%
sse125424
 
4.8%
wnw123072
 
4.7%
west121874
 
4.7%
wsw119376
 
4.6%
nw118928
 
4.6%
north116153
 
4.4%
Other values (13)1036395
39.7%

Most occurring characters

ValueCountFrequency (%)
S1175417
13.8%
W1160207
13.7%
N1008720
11.9%
E894911
10.5%
a542454
 
6.4%
t451214
 
5.3%
C450014
 
5.3%
l374577
 
4.4%
m285016
 
3.4%
o251024
 
3.0%
Other values (12)1904308
22.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5411221
63.7%
Lowercase Letter3086641
36.3%

Most frequent character per category

ValueCountFrequency (%)
a542454
17.6%
t451214
14.6%
l374577
12.1%
m285016
9.2%
o251024
8.1%
h251024
8.1%
e211435
 
6.9%
r205714
 
6.7%
s200190
 
6.5%
u134871
 
4.4%
Other values (2)179122
 
5.8%
ValueCountFrequency (%)
S1175417
21.7%
W1160207
21.4%
N1008720
18.6%
E894911
16.5%
C450014
 
8.3%
A210797
 
3.9%
L164998
 
3.0%
M164998
 
3.0%
V135360
 
2.5%
R45799
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Latin8497862
100.0%

Most frequent character per script

ValueCountFrequency (%)
S1175417
13.8%
W1160207
13.7%
N1008720
11.9%
E894911
10.5%
a542454
 
6.4%
t451214
 
5.3%
C450014
 
5.3%
l374577
 
4.4%
m285016
 
3.4%
o251024
 
3.0%
Other values (12)1904308
22.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII8497862
100.0%

Most frequent character per block

ValueCountFrequency (%)
S1175417
13.8%
W1160207
13.7%
N1008720
11.9%
E894911
10.5%
a542454
 
6.4%
t451214
 
5.3%
C450014
 
5.3%
l374577
 
4.4%
m285016
 
3.4%
o251024
 
3.0%
Other values (12)1904308
22.4%

Wind_Speed(mph)
Real number (ℝ≥0)

MISSING
ZEROS

Distinct139
Distinct (%)< 0.1%
Missing344548
Missing (%)13.0%
Infinite0
Infinite (%)0.0%
Mean8.094265321
Minimum0
Maximum822.8
Zeros164999
Zeros (%)6.2%
Memory size20.2 MiB
2021-05-02T19:09:25.853037image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14.6
median7
Q310.4
95-th percentile17
Maximum822.8
Range822.8
Interquartile range (IQR)5.8

Descriptive statistics

Standard deviation5.130874422
Coefficient of variation (CV)0.6338900714
Kurtosis1835.324876
Mean8.094265321
Median Absolute Deviation (MAD)2.4
Skewness13.54744091
Sum18676003.6
Variance26.32587234
MonotocityNot monotonic
2021-05-02T19:09:26.071780image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.6168473
 
6.4%
5.8166509
 
6.3%
0164999
 
6.2%
3.5157352
 
5.9%
6.9154424
 
5.8%
8.1137879
 
5.2%
9.2122169
 
4.6%
10.499890
 
3.8%
598431
 
3.7%
695379
 
3.6%
Other values (129)941808
35.5%
(Missing)344548
 
13.0%
ValueCountFrequency (%)
0164999
6.2%
172
 
< 0.1%
1.2319
 
< 0.1%
2156
 
< 0.1%
2.3640
 
< 0.1%
ValueCountFrequency (%)
822.85
< 0.1%
703.12
 
< 0.1%
5802
 
< 0.1%
3281
 
< 0.1%
2551
 
< 0.1%

Precipitation(in)
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct256
Distinct (%)< 0.1%
Missing1529311
Missing (%)57.7%
Infinite0
Infinite (%)0.0%
Mean0.01595538729
Minimum0
Maximum25
Zeros937421
Zeros (%)35.3%
Memory size20.2 MiB
2021-05-02T19:09:26.333770image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.07
Maximum25
Range25
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.1849330967
Coefficient of variation (CV)11.59063665
Kurtosis2806.97326
Mean0.01595538729
Median Absolute Deviation (MAD)0
Skewness49.31946349
Sum17910.72
Variance0.03420025026
MonotocityNot monotonic
2021-05-02T19:09:26.593608image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0937421
35.3%
0.0151978
 
2.0%
0.0225730
 
1.0%
0.0317459
 
0.7%
0.0412841
 
0.5%
0.0510346
 
0.4%
0.068267
 
0.3%
0.076766
 
0.3%
0.085417
 
0.2%
0.094825
 
0.2%
Other values (246)41500
 
1.6%
(Missing)1529311
57.7%
ValueCountFrequency (%)
0937421
35.3%
0.0151978
 
2.0%
0.0225730
 
1.0%
0.0317459
 
0.7%
0.0412841
 
0.5%
ValueCountFrequency (%)
251
< 0.1%
10.81
< 0.1%
10.181
< 0.1%
10.161
< 0.1%
10.142
< 0.1%

Weather_Condition
Categorical

HIGH CARDINALITY
MISSING

Distinct121
Distinct (%)< 0.1%
Missing52931
Missing (%)2.0%
Memory size20.2 MiB
Clear
618969 
Fair
416145 
Mostly Cloudy
370250 
Overcast
290639 
Partly Cloudy
257555 
Other values (116)
645372 

Length

Max length35
Median length8
Mean length8.383881059
Min length3

Characters and Unicode

Total characters21789120
Distinct characters45
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)< 0.1%

Sample

1st rowLight Rain
2nd rowLight Rain
3rd rowOvercast
4th rowMostly Cloudy
5th rowMostly Cloudy
ValueCountFrequency (%)
Clear618969
23.3%
Fair416145
15.7%
Mostly Cloudy370250
14.0%
Overcast290639
11.0%
Partly Cloudy257555
9.7%
Cloudy156239
 
5.9%
Scattered Clouds155332
 
5.9%
Light Rain131040
 
4.9%
Light Snow34489
 
1.3%
Rain30462
 
1.1%
Other values (111)137810
 
5.2%
(Missing)52931
 
2.0%
2021-05-02T19:09:27.147779image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cloudy791514
21.6%
clear618969
16.9%
fair420668
11.5%
mostly373102
10.2%
overcast290639
 
7.9%
partly259248
 
7.1%
light187692
 
5.1%
rain187642
 
5.1%
clouds155332
 
4.2%
scattered155332
 
4.2%
Other values (48)225406
 
6.1%

Most occurring characters

ValueCountFrequency (%)
l2214651
 
10.2%
a1989459
 
9.1%
r1798665
 
8.3%
C1565832
 
7.2%
y1461871
 
6.7%
t1454018
 
6.7%
o1415521
 
6.5%
e1315553
 
6.0%
d1145018
 
5.3%
1066614
 
4.9%
Other values (35)6361918
29.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter17064018
78.3%
Uppercase Letter3636457
 
16.7%
Space Separator1066614
 
4.9%
Other Punctuation16141
 
0.1%
Dash Punctuation5890
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
l2214651
13.0%
a1989459
11.7%
r1798665
10.5%
y1461871
8.6%
t1454018
8.5%
o1415521
8.3%
e1315553
7.7%
d1145018
6.7%
u966245
 
5.7%
i849441
 
5.0%
Other values (14)2453576
14.4%
ValueCountFrequency (%)
C1565832
43.1%
F452954
 
12.5%
M376357
 
10.3%
O290639
 
8.0%
P262139
 
7.2%
S207848
 
5.7%
L187696
 
5.2%
R187642
 
5.2%
H45388
 
1.2%
T25013
 
0.7%
Other values (8)34949
 
1.0%
ValueCountFrequency (%)
1066614
100.0%
ValueCountFrequency (%)
/16141
100.0%
ValueCountFrequency (%)
-5890
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin20700475
95.0%
Common1088645
 
5.0%

Most frequent character per script

ValueCountFrequency (%)
l2214651
10.7%
a1989459
 
9.6%
r1798665
 
8.7%
C1565832
 
7.6%
y1461871
 
7.1%
t1454018
 
7.0%
o1415521
 
6.8%
e1315553
 
6.4%
d1145018
 
5.5%
u966245
 
4.7%
Other values (32)5373642
26.0%
ValueCountFrequency (%)
1066614
98.0%
/16141
 
1.5%
-5890
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII21789120
100.0%

Most frequent character per block

ValueCountFrequency (%)
l2214651
 
10.2%
a1989459
 
9.1%
r1798665
 
8.3%
C1565832
 
7.2%
y1461871
 
6.7%
t1454018
 
6.7%
o1415521
 
6.5%
e1315553
 
6.0%
d1145018
 
5.3%
1066614
 
4.9%
Other values (35)6361918
29.2%

Sunrise_Sunset
Categorical

Distinct2
Distinct (%)< 0.1%
Missing57
Missing (%)< 0.1%
Memory size20.2 MiB
Day
1968278 
Night
683526 

Length

Max length5
Median length3
Mean length3.515517738
Min length3

Characters and Unicode

Total characters9322464
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNight
2nd rowNight
3rd rowNight
4th rowNight
5th rowDay
ValueCountFrequency (%)
Day1968278
74.2%
Night683526
 
25.8%
(Missing)57
 
< 0.1%
2021-05-02T19:09:27.615513image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-05-02T19:09:27.787384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
day1968278
74.2%
night683526
 
25.8%

Most occurring characters

ValueCountFrequency (%)
D1968278
21.1%
a1968278
21.1%
y1968278
21.1%
N683526
 
7.3%
i683526
 
7.3%
g683526
 
7.3%
h683526
 
7.3%
t683526
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6670660
71.6%
Uppercase Letter2651804
 
28.4%

Most frequent character per category

ValueCountFrequency (%)
a1968278
29.5%
y1968278
29.5%
i683526
 
10.2%
g683526
 
10.2%
h683526
 
10.2%
t683526
 
10.2%
ValueCountFrequency (%)
D1968278
74.2%
N683526
 
25.8%

Most occurring scripts

ValueCountFrequency (%)
Latin9322464
100.0%

Most frequent character per script

ValueCountFrequency (%)
D1968278
21.1%
a1968278
21.1%
y1968278
21.1%
N683526
 
7.3%
i683526
 
7.3%
g683526
 
7.3%
h683526
 
7.3%
t683526
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII9322464
100.0%

Most frequent character per block

ValueCountFrequency (%)
D1968278
21.1%
a1968278
21.1%
y1968278
21.1%
N683526
 
7.3%
i683526
 
7.3%
g683526
 
7.3%
h683526
 
7.3%
t683526
 
7.3%

Interactions

2021-05-02T19:06:22.014064image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:23.494740image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:24.722778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:25.981806image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:27.305330image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:28.560303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:29.802829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:30.900462image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:31.532263image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:32.825575image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:34.125775image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:35.298587image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:36.579151image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:37.915329image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:39.168934image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:40.429159image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:41.625215image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:42.310790image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:43.663315image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:45.019873image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:46.216136image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:47.514045image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:48.858195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:50.096433image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:51.370826image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:52.530293image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:53.171067image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:54.436538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:55.830844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:57.223093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:06:58.676633image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:00.223297image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:01.577598image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:02.862808image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:04.027308image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:04.666913image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:05.954218image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:07.335369image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:08.663552image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:09.859815image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:11.235719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:12.536174image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:13.848430image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:15.017100image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:15.691812image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:17.107444image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:18.441731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:19.749928image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:20.933818image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:22.283139image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:23.634270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:24.954772image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:26.275067image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:26.921668image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:28.228164image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:29.548354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:30.873538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:32.049224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:33.392833image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:34.813569image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:36.078791image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:37.239079image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:37.892675image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:39.177885image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:40.566029image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:41.873523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:43.040805image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:44.358176image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:45.729331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:47.124472image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:48.404224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:49.105448image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:50.294578image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:51.532866image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:52.787719image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:53.878047image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:55.051324image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:56.251463image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:57.387702image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:58.530995image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:59.152850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:07:59.843426image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:00.573974image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:01.307906image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:02.058247image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:02.792370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:03.556276image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:04.307269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:05.057807image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-05-02T19:08:05.774151image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-05-02T19:09:27.940865image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-02T19:09:28.697539image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-02T19:09:29.089377image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-02T19:09:29.516043image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-02T19:09:29.917518image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-02T19:08:14.004225image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-02T19:08:19.567739image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-02T19:08:34.034971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-05-02T19:08:36.512328image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TMCSeverityStart_TimeEnd_TimeStart_LatStart_LngDistance(mi)SideStateTemperature(F)Humidity(%)Pressure(in)Visibility(mi)Wind_DirectionWind_Speed(mph)Precipitation(in)Weather_ConditionSunrise_Sunset
0201.032016-02-08 05:46:002016-02-08 11:00:0039.865147-84.0587230.01ROH36.991.029.6810.0CalmNaN0.02Light RainNight
1201.022016-02-08 06:07:592016-02-08 06:37:5939.928059-82.8311840.01LOH37.9100.029.6510.0CalmNaN0.00Light RainNight
2201.022016-02-08 06:49:272016-02-08 07:19:2739.063148-84.0326080.01ROH36.0100.029.6710.0SW3.5NaNOvercastNight
3201.032016-02-08 07:23:342016-02-08 07:53:3439.747753-84.2055820.01ROH35.196.029.649.0SW4.6NaNMostly CloudyNight
4201.022016-02-08 07:39:072016-02-08 08:09:0739.627781-84.1883540.01ROH36.089.029.656.0SW3.5NaNMostly CloudyDay
5201.032016-02-08 07:44:262016-02-08 08:14:2640.100590-82.9251940.01ROH37.997.029.637.0SSW3.50.03Light RainDay
6201.022016-02-08 07:59:352016-02-08 08:29:3539.758274-84.2305070.00ROH34.0100.029.667.0WSW3.5NaNOvercastDay
7201.032016-02-08 07:59:582016-02-08 08:29:5839.770382-84.1949010.01ROH34.0100.029.667.0WSW3.5NaNOvercastDay
8201.022016-02-08 08:00:402016-02-08 08:30:4039.778061-84.1720050.00LOH33.399.029.675.0SW1.2NaNMostly CloudyDay
9201.032016-02-08 08:10:042016-02-08 08:40:0440.100590-82.9251940.01ROH37.4100.029.623.0SSW4.60.02Light RainDay

Last rows

TMCSeverityStart_TimeEnd_TimeStart_LatStart_LngDistance(mi)SideStateTemperature(F)Humidity(%)Pressure(in)Visibility(mi)Wind_DirectionWind_Speed(mph)Precipitation(in)Weather_ConditionSunrise_Sunset
2651851201.032017-08-30 17:32:092017-08-30 18:16:1933.774685-118.0468370.0RCA87.647.029.6810.0South5.8NaNPartly CloudyDay
2651852201.022017-08-30 17:31:392017-08-30 18:15:4933.853939-117.9067840.0RCA93.034.029.6610.0Variable3.5NaNClearDay
2651853201.022017-08-30 17:54:402017-08-30 18:39:2334.073830-118.2332690.0RCA88.040.029.6710.0Variable4.6NaNClearDay
2651854201.032017-08-30 18:04:192017-08-30 19:04:1934.072350-117.9383850.0RCA98.627.029.6910.0SSW6.9NaNPartly CloudyDay
2651855201.022017-08-30 18:28:482017-08-30 18:57:5434.173161-118.5359880.0RCA100.018.029.6610.0WNW4.6NaNClearDay
2651856201.032017-08-30 18:41:302017-08-30 19:11:0734.495808-118.6239320.0RCA100.018.028.8510.0WNW5.00.0FairDay
2651857201.032017-08-30 18:59:022017-08-30 19:27:5734.031322-118.4337230.0RCA77.064.029.6910.0SSW5.8NaNClearDay
2651858201.032017-08-30 18:57:522017-08-30 19:26:1134.106785-117.3691020.0LCA102.216.029.736.0SSW5.8NaNHazeDay
2651859201.032017-08-30 19:49:012017-08-30 20:18:0033.924686-118.1039810.0RCA88.039.029.6810.0West3.5NaNClearNight
2651860201.022017-08-30 20:17:212017-08-30 20:47:2133.729469-117.3973540.0RCA89.640.029.7810.0South3.5NaNClearNight

Duplicate rows

Most frequent

TMCSeverityStart_TimeEnd_TimeStart_LatStart_LngDistance(mi)SideStateTemperature(F)Humidity(%)Pressure(in)Visibility(mi)Wind_DirectionWind_Speed(mph)Precipitation(in)Weather_ConditionSunrise_Sunsetcount
533201.032018-09-16 13:24:132018-09-16 13:51:5433.978249-81.1950840.0RSC75.0100.029.741.0SE13.80.01RainDay11
849245.032020-11-12 06:28:162020-11-12 07:12:1125.942879-80.1876750.0RFL78.0100.029.9110.0S5.00.00Partly CloudyNight11
843245.032020-03-12 22:33:352020-03-12 23:02:4233.585026-84.5131450.0RGA69.065.028.8710.0SSW9.00.00Partly CloudyNight9
532201.032018-09-16 13:24:122018-09-16 13:51:5433.978249-81.1950840.0RSC75.0100.029.741.0SE13.80.01RainDay8
801241.032019-09-16 15:09:552019-09-16 15:54:0232.907650-96.8972780.0RTX95.034.029.4610.0E9.00.00Partly CloudyDay7
374201.022020-08-14 07:43:262020-08-14 08:13:0247.673512-117.4676130.0RWA56.045.027.6710.0SE8.00.00FairDay4
429201.022020-10-27 06:54:072020-10-27 07:53:2138.713409-90.2840420.0RMO37.086.029.656.0NNE7.00.00Light RainNight4
658201.032020-04-10 19:53:372020-04-10 20:21:5238.843868-94.5295790.0RMO52.037.028.7510.0SSE9.00.00CloudyNight4
845245.032020-06-09 09:38:202020-06-09 10:07:3929.768175-95.2654420.0RTX87.069.029.7310.0SSW9.00.00FairDay4
166201.022019-07-16 08:41:162019-07-16 09:40:5833.414162-82.0146410.0LGA82.076.029.975.0CALM0.00.00FairDay3